Performance Improvement of DAG-Aware Task Scheduling Algorithms with Efficient Cache Management in Spark

نویسندگان

چکیده

Directed acyclic graph (DAG)-aware task scheduling algorithms have been studied extensively in recent years, and these achieved significant performance improvements data-parallel analytic platforms. However, current DAG-aware algorithms, among which HEFT GRAPHENE are notable, pay little attention to the cache management policy, plays a vital role in-memory systems such as Spark. Cache policies that designed for Spark exhibit poor task-scheduling leads misses degradation. In this study, we propose new policy known Long-Running Stage Set First (LSF), makes full use of dependencies optimize algorithms. LSF calculates caching prefetching priorities resilient distributed datasets according their unprocessed workloads significance parallel scheduling, key factors Moreover, present cache-aware algorithm based on reduce resource fragmentation computing. Experiments demonstrate that, compared with LRU MRD, same improve JCT by up 42% 30%, respectively. The proposed also exhibits about 12% reduction average job completion time LSF.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Task-aware Scheduling Algorithms

A common pattern in the architectures of modern interactive web-services is that of large request fan-outs, where even a single end-user request (task) arriving at an application server triggers tens to thousands of data accesses (sub-tasks) to different stateful backend servers. The overall response time of each task is bottlenecked by the completion time of the slowest sub-task, making such w...

متن کامل

Efficient DAG Scheduling with Resource-Aware Clustering for Heterogeneous Systems

Task scheduling on Heterogeneous Distributed Computing Systems (HeDCSs) with the purpose of efficiency and reduction of execution time is of paramount importance. In this paper a novel task scheduling algorithm, called Resource-Aware Clustering (RAC) for Directed Acyclic Graphs (DAGs) is proposed. The objective of this algorithm is to keep the relative load balancing and efficiency increase bet...

متن کامل

Lightweight Task Analysis for Cache-Aware Scheduling on Heterogeneous Clusters

We present a novel characterization of how a program stresses cache. This characterization permits fast performance prediction in order to simulate and assist task scheduling on heterogeneous clusters. It is based on the estimation of stack distance probability distributions. The analysis requires the observation of a very small subset of memory accesses, and yields a reasonable to very accurat...

متن کامل

Selection and replacement algorithms for memory performance improvement in Spark

As a parallel computation framework, Spark can cache repeatedly resilient distribution datasets (RDDs) partitions in different nodes to speed up the process of computation. However, Spark does not have a good mechanism to select reasonable RDDs to cache their partitions in limited memory. In this paper, we propose a novel selection algorithm, by which Spark can automatically select the RDDs to ...

متن کامل

Cache-aware Scheduling with Limited Preemptions

In safety-critical applications, the use of advanced real-time scheduling techniques is significantly limited by the difficulty of finding tight estimations of worst-case execution parameters. This problem is further complicated by the use of cache memories, which reduce the predictability of the executing threads due to cache misses. In this paper, we analyze the effects of preemptions on wors...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2021

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics10161874